User:Was a bee/Gene
1. Test
[edit]A test trying to put a marker icon automatically onto the chromosome ideogram image to show location of gene (based on basepair position data stored in wikidata).
In other words, trying to make the image like below automatically.
Currently in Wikimedia Commons, there are about 100 ideogram images which are used to show gene position. See commons:category:Human chromosome ideograms which indicates gene location.
2. Result
[edit]Bp start | 155,799,986 |
Bp end | 155,812,273 |
Good (see the test case box at the right)
3. Calculation detail
[edit]The position where the marker should be put is calculated as follows. Although math expression looks something complex, actual calculation is not so complex. The concepts which are used here are basically only plus and minus for calculating length, and multiplication and division for calculating scaling.
Other concepts used here are... Dividing sum of gene-start and gene-end by 2 is to get mid point of the gene. Dividing arrow-width by 2 is because image position is set by leftist point of the image, not by the center of the image. Conditional branch for arrow-width is just a technical problem which is needed to choose different shape of rectangles among these (commons:Template:Red rectangle series) as a marker depending on target gene length.
Calculation algorithm is as follows....
Where red terms are variables retrieved from Wikidata, the blue term is the term calculated based on variables retrieved from Wikidata, and other black terms are constants.
- : Gene start position from the terminus of p-arm. (unit: basepair, example: 155799986 from wikidata:Q14860072)
- : Gene end position from the terminus of p-arm. (unit: basepair, example: 155812273 from wikidata:Q14860072)
- : Length of the th chromosome which contains the target gene. (unit: basepair, example: 159345973 for chromosome 7)
- : Horizontal position of pter (tip of p arm/short arm) in ideogram image (unit: pixel, example: 6 px)
- : Horizontal position of qter (tip of q arm/long arm) in ideogram image (unit: pixel, example: 1109 px)
- : Width of ideogram image (unit: pixel, example: 1125 px)
- : Shown width of ideogram image in Wikipedia page (unit: pixel, example: 300 px)
- : Calculated marker width proportional to the gene length. This is generally too small (e.g. 0.05px) and non integer. (unit: pixel)
- : Actual marker width shown in Wikipedia page. Minimum value is 2 and always integer (using ceiling function). To say, 2, 3, 4, 5...(unit: pixel)
By substituting the terms with example values, we get...
The form in the last lines of each equations are used in program.
Actual calculation is like as follows for SHH gene (wikidata:Q14860072).
Firstly we need to calculate arrow-width.
Here we got 0.022px for marker image width for SHH gene. Is this wrong? No. This result is from tha fact that most genes are very short compared to whole chromosome length. If whole chromosome is shown in about 300px, most human genes (≒10kb) span only from 0.01px to 0.05px, depending on whole chromosome length. So third equation do job here. , to say , is true. Hence we get....
Arrow-width is 2px. Then we can calculate arrow position using this value.
Thus we got answer. We should put the 2px width arrow (this -> ) at 288.2px position.
Position coordinate is from the left (0 px) to the right (300 px).4. See also
[edit]Effort for reader-friendliness for general readers
Introductory gene textbook website by National Library of Medicine. It includes gene location data for each gene pages.
5. Test with Module
[edit]at Module:Infobox gene/sandbox2
{{#invoke:Infobox gene/sandbox2|getTemplateData}}
https://en.wikipedia.org/w/index.php?title=Sonic_hedgehog&diff=prev&oldid=795122559
Category:Pages with script errors - Article namespace
6. On the width of the marker
[edit]Basepairs | Approx. width in Chr.1 (Longest chromosome) |
Approx. width in Chr.21 (Shortest chromosome) |
---|---|---|
248.9Mb (Chr.1 length) |
300px | 1599px |
46.7Mb (Chr.21 length) |
56px | 300px |
5Mb | 6px | 32px |
2.3Mb (Longest human gene length) |
2.8px | 15px |
1.6Mb | 2px | 10.6px |
0.93Mb | 1.1px | 6px |
0.31Mb | 0.38px | 2px |
10kb (Typical human gene length) |
0.012px | 0.064px |
Gene length is varies from one by one. So marker width also has to change page by page. Here I list up some data which are needed to think about marker width.
As result of some experiments, marker width must be at least 2px, because 1px marker is difficult to detect.
And largest marker width would be 15px (red area in the table at the right).
Length of the longest human genes are... http://www.cshlp.org/ghg5_all/section/gene.shtml
- Dystrophin - 2.3 Mb at Chr.X
- CNTNAP2 - 2.3 Mb at Chr.7
- PTPRD - 2.3 Mb at Chr.9
- RUNX1 - 1.2 Mb at Chr.21
- LARGE - 760 kb at Chr.22
Red rectangle series |
---|
|
7. Ideograms
[edit]Human chromosome ideograms in svg | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
X |
Y |
Currently used ideogram set is as above. If you want use different ideogram set, following 5 conditions must be met.
- 24 images are needed. (1-22 and XY)
- All 24 images must have same image size (same height and same width).
- Among all 24 images, pter (terminus of the p-arm, leftist point) and qter (terminus of the q-arm, rightest point) must be set at the same position.
- Banding pattern must be drawn in basepair-proportional style. Standard ideograms defined by ISCN are drawn based on actual visual appearance of stained chromosomes under microscope, is not basepair-proportional. (see the table below)
- All file names must have same format, changing only in chromosome number. For example, if you created chromosome 1 image named
MyPrettyNice_Chr1_Ideogram.png
, then the rest of file names should be as follows.
MyPrettyNice_Chr2_Ideogram.png
MyPrettyNice_Chr3_Ideogram.png
MyPrettyNice_Chr4_Ideogram.png
- ....
MyPrettyNice_Chr9_Ideogram.png
MyPrettyNice_Chr10_Ideogram.png
- ....
MyPrettyNice_Chr22_Ideogram.png
MyPrettyNice_ChrX_Ideogram.png
MyPrettyNice_ChrY_Ideogram.png
After these 5 conditions are met, you can switch current images into new images, by changing the part of the code where ideogram file name is defined.
Ideogram | Description | Image | The common | Difference | Cause |
---|---|---|---|---|---|
We can not use this type |
Chr.7 ideogram of ISCN standard, which is drawn based on actual visual appearance of stained chromosome under microscope. | In both images, band order is the same. You can see that band color, from the left to right, is set in the following order...
This order is the same. |
The order is the same. But widths of each bands are different. The salient parts are highlighted in the image below.
The bands which are connected to each other are same band. You can see the difference of their width between the upper and the lower ideogram. |
The cause of this difference is that basepair-density is not homogeneous within the chromosome. In some part basepairs are densely packed, and in other part basepairs are sparsely packed. | |
We can use this type |
Chr.7 ideogram drawn in basepair-proportional style. As far as I know, all genome browsers (e.g. Ensembl, UCSC and so on) use this style of ideograms. |
8. Forward and Reverse strands
[edit]https://www.biostars.org/p/210929/
https://www.biostars.org/p/3908/
http://seqanswers.com/forums/showthread.php?t=39388
In GRCh, as convention, direction from p-arm (short arm) to q-arm (long arm) is forward. The opposite direction is reverse.
From Nelson, Sarah C., et al. Trends in Genetics 28.8 (2012): 361-363.[1]
In all human reference chromosomes, as for other eukaryotes, the plus (+) strand is defined as the strand with its 5' end at the tip of the short arm (Genome Reference Consortium, personal communication, March 27, 2012).
- Sense (molecular biology)
- Directionality (molecular biology)
- Upstream and downstream (DNA)
- Sense strand
- Coding strand
- Reference genome
9. Others
[edit]If the following kind of technology is available, it's so nice.
But currently it seems there are no this kind of technology.
Perhaps Graph extension can do something...